Enforce validDocIds consensus in upsert task generators and add includeBitmaps to validDocIdsMetadata API#18853
Conversation
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## master #18853 +/- ##
============================================
- Coverage 64.81% 64.78% -0.03%
- Complexity 1322 1343 +21
============================================
Files 3393 3393
Lines 211246 211402 +156
Branches 33208 33252 +44
============================================
+ Hits 136917 136963 +46
- Misses 63284 63367 +83
- Partials 11045 11072 +27
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Harness. 🚀 New features to boost your workflow:
|
ef91178 to
8571ed6
Compare
Mirror the executor's validDocIds enforcement at generation time so inconsistent segments are never scheduled. UpsertCompactionTaskGenerator and UpsertCompactMergeTaskGenerator now validate each segment's replicas (CRC match, server health, and EQUAL/UNSAFE/MOST_VALID_DOCS consensus) via the shared MinionTaskUtils.selectValidDocIdsMetadataForConsensus, requesting includeBitmaps only for EQUAL and requiring all assigned replicas to respond for the strict modes. A new validDocIdsValidationMode config (STRICT default, EXECUTOR_ONLY) gates the generator-side checks: STRICT runs them in both generator and executor; EXECUTOR_ONLY downgrades the generator to a lenient pick and leaves the executor as the sole gate.
8571ed6 to
cac3f5e
Compare
xiangfu0
left a comment
There was a problem hiding this comment.
Found one high-signal issue; see inline comment.
| if (consensusMode == MinionConstants.ValidDocIdsConsensusMode.EQUAL) { | ||
| ValidDocIdsMetadataInfo first = usableReplicas.get(0); | ||
| RoaringBitmap consensusBitmap = deserializeBitmapOrNull(first); | ||
| if (consensusBitmap == null) { |
There was a problem hiding this comment.
This makes controller-first rolling upgrades incompatible under the new default STRICT/EQUAL path. Older servers still answer this endpoint but omit bitmap, and ValidDocIdsMetadataInfo explicitly treats that as expected for old servers; here we convert that mixed-version response into a hard skip, so upsert compaction/compact-merge task generation stops until every server is upgraded. Please keep the generator default executor-only, or add an old-server fallback before making bitmap-based prescheduling the default.
The consensus modes (EQUAL/MOST_VALID_DOCS) fetch validDocIds metadata from every replica, and EQUAL also carries the serialized bitmap in each entry, so reusing the regular numSegmentsBatchPerServerRequest (default 500) can produce very large per-request payloads. Add a shared validDocIdsConsensusFetchBatchSize knob (default 10) on UpsertCompactionTask and route both the compaction and compact-merge generators through MinionTaskUtils.resolveValidDocIdsFetchBatchSize: UNSAFE keeps the regular batch, the consensus modes use the smaller consensus batch. The smaller batch is intentional for the bitmap-bearing fetch and does not inherit a user-set numSegmentsBatchPerServerRequest.
The consensus config (validDocIdsConsensusMode, validDocIdsValidationMode, validDocIdsConsensusFetchBatchSize keys and defaults) is shared by the upsert compaction, compact-merge, and segment-refresh tasks, so it belongs at the top level of MinionConstants next to the ValidDocIdsConsensusMode and ValidDocIdsValidationMode enums rather than nested under UpsertCompactionTask. Update all references in both generators, both executors, MinionTaskUtils, and the tests. Config key string values are unchanged. The two constants that already shipped (VALID_DOC_IDS_CONSENSUS_MODE_KEY and its default) keep a @deprecated forwarding alias at the old nested location for source/binary compatibility.
Jackie-Jiang
left a comment
There was a problem hiding this comment.
Let's not include bitmap check on the generator side. Comparing valid docs count should be good enough. Serializing back 500 (default batch size) bitmaps could be quite expensive, and likely not necessary. Valid docs count match should be good enough.
For CRC check, we should also improve it to allow data crc match. Take a look at BaseTableDataManager.hasSameCRC()
|
Yeah true I agree, in this case I changed it to 10 per batch but that is not scalable. So, going to just check the count. |
xiangfu0
left a comment
There was a problem hiding this comment.
Found a critical mixed-version issue; see inline comment.
| if (consensusMode == MinionConstants.ValidDocIdsConsensusMode.EQUAL) { | ||
| ValidDocIdsMetadataInfo first = usableReplicas.get(0); | ||
| RoaringBitmap consensusBitmap = deserializeBitmapOrNull(first); | ||
| if (consensusBitmap == null) { |
There was a problem hiding this comment.
This still breaks mixed-version rolling upgrades. With the new defaults (EQUAL + STRICT), the generator requests includeBitmaps=true. Older servers ignore that query param and return ValidDocIdsMetadataInfo without bitmap, and this branch turns that into a hard skip, so controller-first upgrades stop scheduling upsert compaction / compact-merge tasks until all servers are upgraded or operators manually flip EXECUTOR_ONLY. Pinot normally needs controller/server roll-forward compatibility without per-table config changes, so this needs a fallback for older servers or the generator-side default needs to stay EXECUTOR_ONLY.
The generator-side EQUAL consensus check now requires every replica to report the same valid doc count instead of comparing full validDocIds bitmaps. Comparing counts avoids serializing a RoaringBitmap per replica back to the controller, which is expensive for large upsert tables. The executor remains the authoritative gate and still verifies byte-identical bitmaps before compacting, so a count match that hides a set difference is scheduled-then- failed there rather than mis-compacted. This rolls back the now-unnecessary generator bitmap-fetch machinery added earlier on this branch (all unreleased): the includeBitmaps validDocIdsMetadata endpoint param, the ValidDocIdsMetadataInfo bitmap field, the ServerSegmentMetadataReader overload, and the validDocIdsConsensusFetchBatchSize config knob. Generators go back to the regular per-server fetch batch.
Summary
Make the upsert task generators reject segments with inconsistent replicas before scheduling, instead of relying on the executor to fail those tasks after the fact. Concretely, this PR:
includeBitmapsflag to thevalidDocIdsMetadataendpoint so the controller can batch-fetch each replica's validDocIds bitmap in one call.EQUAL/UNSAFE/MOST_VALID_DOCS) inUpsertCompactionTaskGeneratorandUpsertCompactMergeTaskGenerator, skipping any segment whose replicas disagree.validDocIdsValidationModeconfig (STRICTdefault /EXECUTOR_ONLY) to toggle the generator-side checks.Follow-up to #17696 (which added the executor-side consensus), per this review comment.
Background
#17696 made the upsert compaction executor reconcile validDocIds across replicas before compacting — it checks CRC, server health, and a configurable
validDocIdsConsensusMode(UNSAFE/EQUAL/MOST_VALID_DOCS), failing the task rather than letting a less-complete replica overwrite a more-complete one.The generator, however, still scheduled a task for every eligible segment, even when its replicas were inconsistent (a CRC mismatch mid-reload, an unhealthy server, or divergent validDocIds). The executor then has to fail those tasks to stay safe — which wastes a segment download and a task slot every cycle, and the same segment keeps getting re-picked and re-failed.
Default Setting:
Backward compatibility
validDocIdsValidationModedefaults toSTRICT, so generators now enforce consensus by default; setEXECUTOR_ONLYto restore the old behavior.Testing
Unit tests cover each mode through the generators —
EQUAL(agree / disagree / replica missing / CRC mismatch / unhealthy / missing bitmap),UNSAFE,MOST_VALID_DOCS— plus the config parsing/resolution helpers.